Topic Set Size Design with Variance Estimates from Two-Way ANOVA

نویسنده

Tetsuya Sakai

چکیده

Recently, Sakai proposed two methods for determining the topic set size n for a new test collection based on variance estimates from past data: the first method determines the minimum n to ensure high statistical power [22], while the second method determines the minimum n to ensure tight confidence invervals [23]. These methods are based on statistical techniques described by Nagata [15]. While Sakai [22] used variance estimates based on oneway ANOVA, Sakai [23] used the 95% percentile method proposed by Webber, Moffat and Zobel [38]. This paper reruns the experiments reported by Sakai [22, 23] using variance estimates based on two-way ANOVA [17], which turn out to be slightly larger than their one-way ANOVA counterparts and substantially larger than the percentile-based ones. If researchers should choose to “err on the side of over-sampling” as recommened by Ellis [10], the variance estimation method based on two-way ANOVA and the results reported in this paper are probably the ones researchers should adopt. We also establish empirical relationships between the two topic set size design methods, and discuss the balance between n and the pool depth pd using both methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classroom Simulation: Understanding One-way Random-effect Anova

The one-way random-effect ANOVA model is presented, and two simulated datasets are analyzed. and discussed from three points of view: (1) The standard ANOVA table, F test, and method-of-moments estimates of variance components, which can lead to negative estimates. (2) Maximum likelihood estimates of variance components. (3) Bayesian probability intervals for variance components based on flat p...

متن کامل

On Estimating Variances for Topic Set Size Design

Topic set size design is a suite of statistical techniques for determining the appropriate number of topics when constructing a new test collection. One vital input required for these techniques is an estimate of the population variance of a given evaluation measure, which in turn requires a topic-by-run score matrix. Hence, to build a new test collection, a pilot data set is a prerequisite. Re...

متن کامل

Comparison of sediment grain size analysis among two methods and three instruments using environmental samples

Sediment grain size is measured using a variety of methods, but comparisons of measurement methods on environmental samples are limited. Three instruments (Coulter LS230, Horiba LA900, and SediGraph 5100) utilizing two fundamentally different operating principles were employed to measure a single set of 20 different sediment samples collected at shelf depths from the Southern California Bight. ...

متن کامل

Topic Set Size Design with the Evaluation Measures for Short Text Conversation

Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P, all of whic...

متن کامل

Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths

IR evaluation measures are oen compared in terms of rank correlation between two system rankings, agreement with the users’ preferences, the swap method, and discriminative power. While we view the agreement with real users as the most important, this paper proposes to use the Worst-case Condence interval Width (WCW) curves to supplement it in test-collection environments. WCW is the worst-ca...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Topic Set Size Design with Variance Estimates from Two-Way ANOVA

نویسنده

چکیده

منابع مشابه

Classroom Simulation: Understanding One-way Random-effect Anova

On Estimating Variances for Topic Set Size Design

Comparison of sediment grain size analysis among two methods and three instruments using environmental samples

Topic Set Size Design with the Evaluation Measures for Short Text Conversation

Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths

عنوان ژورنال:

اشتراک گذاری